Minimum variance distortionless response on a warped frequency scale
نویسندگان
چکیده
In this work we propose a time domain technique to estimate an all-pole model based on the minimum variance distortionless response (MVDR) using a warped short time frequency axis such as the Mel scale. The use of the MVDR eliminates the overemphasis of harmonic peaks typically seen in medium and high pitched voiced speech when spectral estimation is based on linear prediction (LP). Moreover, warping the frequency axis prior to MVDR spectral estimation ensures more parameters in the spectral model are allocated to the low, as opposed to high, frequency regions of the spectrum, thereby mimicking the human auditory system. In a series of speech recognition experiments on the Switchboard Corpus (spontaneous English telephone speech), the proposed approach achieved a word error rate (WER) of 32.1% for female speakers, which is clearly superior to the 33.2% WER obtained by the usual combination of Mel warping and linear prediction.
منابع مشابه
Frequency warping and robust speaker verification: a comparison of alternative mel-scale representations
Accuracy of speaker verification is high under controlled conditions but falls off rapidly in the presence of interfering sounds. This is because spectral features, such as Mel-frequency cepstral coefficients (MFCCs), are sensitive to additive noise. MFCCs are a particular realization of warped-frequency representation with low-frequency focus. But there are several alternative, potentially mor...
متن کاملWarping and Scaling of the Minimum Variance Distortionless Response
Spectral estimation based on the minimum variance distortionless response (MVDR) is well-known in the signal processing literature and has been shown to be superior to linear prediction for robust speech recognition. In this work we propose two techniques to improve the resolution and the robustness of the MVDR spectral estimate: The first is a time-domain technique to estimate an all-pole mode...
متن کاملSpeaker dependent model order selection of spectral envelopes
This work introduces a maximum-likelihood based model order (MO) selection technique for spectral envelopes to apply speaker dependent adaptation in the feature-space similar to vocal tract length normalization. Speech recognition systems based on spectral envelopes are using a fixed MO for the underlying linear parametric model. Using a fixed MO over different speakers or channels might not be...
متن کاملFrame based model order selection of spectral envelopes
Spectral envelopes, using (warped or perceptual) linear prediction or minimum variance distortionless response for the underlying linear parametric model, are widely used in speech recognition systems where the frequency resolution, namely the model order (MO), of the spectrum is kept constant. Modeling different types of phonemes such as vowels or fricatives with the same frequency resolution ...
متن کاملSpeaker identification using warped MVDR cepstral features
It is common practice to use similar or even the same feature extraction methods for automatic speech recognition and speaker identification. While the front-end for the former requires to preserve phoneme discrimination and to compensate for speaker differences to some extend, the front-end for the latter has to preserve the unique characteristics of individual speakers. It seems, therefore, c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003